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Abstract 

For a set of n points in the plane, we consider the axis-aligned (p, fc)-Box Covering 
problem: Find p axis-aligned, pairwise-disjoint boxes that together contain at least n — k 
points. In this paper, we consider the boxes to be either squares or rectangles, and we want 
to minimize the area of the largest box. For general p we show that the problem is NP-hard 
for both squares and rectangles. For a small, fixed number p, we give algorithms that find the 
solution in the following running times: For squares we have 0(n + fclogfc) time for p — 1, 
and 0(n log n + k p log p k) time for p — 2,3. For rectangles we get 0(n + k ) for p = 1 and 
0(n\ogn + k 2+p \og p ~ 1 k) time for p = 2, 3. In all cases, our algorithms use O(n) space. 

1 Introduction 

Motivated by clustering, we consider the problem of splitting a large set of points into a small 
number of groups. From a geometric point of view, we want to group points together that are 
'close' with respect to some distance measure. It is easy to see that the choice of distance measure 
directly influences the shape of the clusters. Depending on the application, it may be useful to 
consider only disjoint clusters. It is important to take noise into account, especially when dealing 
with raw data. That means, we may want to remove outliers that are 'far' from the clusters, or 
that would unduly influence their shape. 

In this paper, we consider the following optimization problem: Given a set P of n points in 
the plane and two integers p > and k > 0, find p pairwise-disjoint squares or rectangles that 
together contain at least n — k points of P and minimize the largest area among the p squares or 
rectangles. We treat the squares or rectangles as closed sets, and although we want them to be 
pairwise-disjoint, we allow overlap at their boundaries or corners. 

We call this problem the (p, /c)-Square Covering and the (p, fc)-RECTANGLE Covering 
problem, respectively, according to the shape of the covering regions. The k points that are not 
covered by a solution of the problem are called outliers. 

Both problems are variations and/or extensions of the rectilinear p-center problem. This is 
usually considered as the problem of finding p congruent squares of smallest possible size that 
together contain all points of P, where the p squares may overlap. In our setting, however, we 
have (1) that the p regions must not overlap each other (except at their boundaries) and (2) 
that up to a predefined number of k points are considered as outliers and can be ignored. It is 
known that the rectilinear p-center problem is NP-hard even to approximate within ratio 1.5 |18j . 
However, for p < 4, worst-case optimal-time algorithms arc known: linear time for p < 3 and 
0{n log n) time for p = 4. For p > 5, the best known time bound is 0{n p ~ A log 5 n) [2"31 . 
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For the (p, 0)-Rectangle Covering problem, less work has been done. Bespamyatnikh and 
Segal H] presented a deterministic O(nlogn) time algorithm for p = 2, but no efficient algorithm 
for p > 3 is known. Several papers considered variations of the (2, 0)-Rectangle Covering 
problem — e.g., arbitrary orientation and three or higher dimensions — and achieved efficient 
algorithms; see for example [2l HU1 H3l [Ml 119) . 

Outliers can also be seen as violation of constraints: basically, the points in P are constraints 
to be covered by squares or rectangles in our problems and k of them are allowed to be violated. 
In this sense, there is a connection to geometric optimization with violated constraints which has 
been studied by several researchers. Matousek [TB] and Chan [5] presented efficient algorithms for 
LP-type problems allowing k violated constraints. The class of LP-type problems, which extends 
linear programming in a combinatorial sense, was introduced by Sharir and Wclzl [22] , Also, a 
deterministic linear-time algorithm for LP-type problems of finite LP-dimcnsion is known [5] . The 
LP-dimension is a parameter associated with an LP-type problem; for instance, the (1, 0)-Square 
Covering problem, or equivalcntly the rectilinear 1-center problem, has LP-dimension 3 since 
the smallest unique enclosing square is determined by three points of the given point set. Indeed, 
the rectilinear p-ccntcr problem for p < 3 is known to be an LP-type problem [23) . so linear- 
time algorithms follow. Thus, the (1, /c)-Square Covering problem can be solved in 0(n\ogn + 
k 2 log 2 n) time and the (1, &;)-Rectangle Covering problem in 0(n log n+k^n* log°^ n) time, 
according to Chan [6]. For LP-dimension larger than four, no efficient algorithm has been found 
as to date. More details on LP-type problems can be found in Sharir and Welzl [35], Matousek 
and Skovroh [17], and Dyer et al. [TT] . 

Independent of LP-type problems with violated constraints, there are some previous results 
dealing with outliers when p = 1. Aggarwal et al. [T| achieved a running time of 0((n — k) 2 n log n) 
using 0((n — k)n) space for both the (1, fc)-SQUARE Covering and the (1, fc)-RECTANGLE Cov- 
ering problems. Later, Segal and Kedem [2T] gave an 0(n + k 2 (n — k)) time algorithm for the 
(1, &)-Rectangle Covering problem using 0(n) space. A randomized algorithm that runs in 
O(nlogn) time was given for the (1, fc)-SQUARE COVERING problem by Chan [5]. Most recently, 
Atanassov et al. [3] presented an 0(n + k 3 ) time algorithm for the (1, fc)-RECTANGLE Covering 
problem. 

Most of the above algorithms are optimal when the number of outliers is either a small constant 
or close to n. In this paper, we are interested in algorithms with small running time in k. Ideally, 
we would also like to preserve optimality in n for small k. We summarize the new results shown 
in this paper: 

• NP-hardness: In Section[3] we prove that both the (p, A:)-Square Covering and the (p, k)- 
Rectangle Covering problems are NP-hard when p is part of the input, even for a fixed 
k > 0. These are the first NP-hardness proofs for a variant of the rectilinear p-center 
problem where the covering regions are disjoint and also for the problem of covering points 
by p rectangles. 

• Efficient algorithms for small p: In Section [4] we give efficient algorithms if the number of 
boxes is small. All our algorithms use linear space. The running times of our algorithms 
are summarized in Table [1] Recall that the previously best known results for this problem 
with outliers were restricted to only one box: O(nlogn) for the (1, k)- Square Covering 
problem [5], and 0(n + k 3 ) for the (1, /c)-Rectangle Covering problem [3]. 

Table 1: Running times of our (p, fc)-SQUARE/RECTANGLE Covering algorithms 





Squares 


Rectangles 


p = l 


0(n + fclogfc) 


0{n + k 3 ) 


p = 2 


0(?ilogn + fc 2 log 2 k) 


0(n log n + k A log k) 


P = 3 


0(n log n + k 3 log 3 k) 


0(n log n + k 5 log k) 
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2 A lower bound 



We consider the (p, fc)-SQUARE COVERING and the (p, fc)-RECTANGLE Covering problem. Given 
a set P of n points in the plane, and two integers k > and p > 0, find p axis-aligned pairwise- 
disjoint (overlap of boundaries is allowed), closed squares or rectangles, that together cover at 
least n — k points of P, such that the area of the largest square or rectangle is minimized. We 
refer to the k points that are not contained in the union of all squares or rectangles as outliers. 

The algorithms we present in Section 2] are efficient, as we can show the following lower bound 
that holds for both the (p, /c)-Square Covering and the (p, fc)-RECTANGLE Covering problem. 

Lemma 1. Let k £ N be part of the input and let p be any fixed positive integer. Then, both 
the (p, fc) -Square Covering and the (p, k) -Rectangle Covering problem have an fi(nlogn) 
lower bound in the algebraic decision tree model. 

Proof: We reduce from 1-dimensional set disjointness: Given a sequence S = {ri, . . . ,r n } of n 
real numbers, we want to decide whether there is any repeated element in S. The following works 
for both squares and rectangles. 

Given the sequence S, we generate the point set S = {(ri, r{) | 1 < i < n} € 1Z 2 . We compute 
the p minimal squares that cover 5, allowing exactly k = n — p — 1 outliers, which means that 
the union of the p squares must cover p + 1 points. Thus, the covering squares degenerate to 
points (i.e., squares of side length zero) if and only if there is a repeated element in the sequence. 
Otherwise, by the pigeon hole principle, one of the covering squares must cover at least two points 
and hence, has positive area. □ 

Similar bounds for slightly different problems were given by Chan [3] (p = 1) and by Segal [20] 
(p = 2, k = 0, arbitrary orientation). 

3 NP-Hardness Results 

In this section, we show that both the (p, fc)-SQUARE Covering and the (p, fc)-RECTANGLE Cov- 
ering problems are NP-hard for any fixed k when p is part of the input. In the following, we 
focus on the decision version of the two problems for k = 0: Given n points in the plane and 
an integer p > 0, decide whether or not there exist p axis-aligned unit squares or p axis-aligned 
rectangles of area at most one that together cover all points. We reduce from planar 3-SAT. Note 
that we are not dealing explicitly with outliers. However, the reduction can be adapted by placing 
k points at a sufficiently large distance from the other points as not to be included in the covering. 
Furthermore, note that our reductions work for all possible cases where the squares or rectangles 
may (not) overlap or need (not) be congruent. The optimal solutions may be different, however, 
depending on the underlying case. 

3.1 Covering Points with Squares 

In this section we study the complexity of the (p, A;)-Square Covering problem: cover n — k 
points in the plane with p axis-aligned squares while minimizing the area of the largest square. 

NP-hardncss of the p-ccntcr problem (i.e., covering with congruent squares which are allowed 
to overlap) has been shown previously by Fowler et al. [12], and by Meggiddo and Supowit [T5] . 
Here we show NP hardness for the case of covering by congruent squares that must not overlap 
(except at their boundaries). 

We reduce from planar 3-SAT: given a 3-CNF formula F with variables x±, . . . , x n and clauses 
Ci, . . . , c m , let G(F) be the graph of F, defined as: 

• V = {xi | 1 < i < n} U {cj | 1 < j < m} 

• E = {(xi,Cj) \ Xi £ Cj or Xi G Cj} 

If G(F) is a planar graph, then F is called a planar 3-CNF formula. It is NP-hard to decide 
whether a given planar 3-CNF formula is satisfiable or not [15] . 
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Figure 1: Left: Variable gadget consisting of AN points that can be covered in two different 
ways with 2N unit squares (either light or dark grey). Right: Clause gadget with AM + 1 points 
(including three link points - depicted as hollow circles). 2M boxes are necessary and sufficient to 
cover all points except any one of the link points. 



3.1.1 Reduction 

Given a planar 3-SAT instance, we construct a (p, 0)-Square Covering instance on a grid such 
that the 3-SAT instance is satisfiable if and only if all points can be covered by p unit squares. 
The reduction is as follows, with all points lying on a grid, such that the distance between 
two points in the same grid cell is one unit. 

• For each variable X, , we create a gadget of AN points arranged in a ring- like fashion (where 
N is a sufficiently large constant). By construction, there are only two different ways of 
covering all generated points with 2N unit squares (see Figure Q] left). We associate each 
of the coverings to an assignment either of TRUE or FALSE to the literal, and define the 
TRUE region as the union of squares in the TRUE assignment, and the FALSE region as 
the union of squares in the FALSE assignment. 

• For each clause Cj , we generate AM + 1 points in a linear fashion, where M is another large 
constant. There are three special link points in the gadget: the rightmost, leftmost and 
middle points of the linear segment, depicted as hollow circles in Figure [IJ right. 

The main property of the clause gadget is the following: 

Lemma 2. To cover all points of a clause gadget except for any one of the three link points, 2M 
unit squares are sufficient and necessary. 

Proof: Figure [TJ right, shows a covering of all points (except for the middle link point) with 2M 
squares. By shifting the M rightmost (or leftmost) squares to the center, we can cover the middle 
link, but at the same time we uncover the right (or left) link point; therefore the upper bound 
holds. 

Consider any covering of all non-link points, which forms two sequences of equal length to the 
left and right of the middle link point, that are more than unit distance apart. We need at least 
\{2M — 1 )/2] = M squares to cover each point sequence, thus the lower bound also holds. □ 

We connect each clause gadget with its three corresponding variable gadgets as follows (see 
Figure [2]): from each link point of a clause Cj we add a sequence of connecting points leading to 
one variable. Let ex j (fi2,ji&3,ji resp.) be the total number of points added to connect clause 
gadget Cj with the variable gadgets x\ [xi,Xz, resp.). We set e^j to be odd, which can always be 
done by making the underlying grid sufficiently fine. 

For each connection between clause gadget Cj and the variable gadgets Xx,X2 and £3, we add 
three additional points called switches Sx,j, S2,j and S3J. We put the switches between two points 
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X • 



Cj = x 1 V x 2 V x 3 



Figure 2: Connection between a clause gadget and its corresponding variable gadgets (switches 
depicted as crosses and links as hollow circles). In the clause Cj, x\ and Xi are negated — then- 
switch lies in the FALSE region, whereas xz is non negated in Cj — the switch lies in the TRUE 
region. The assignment of x\, X3 - TRUE (light grey) and X2 - FALSE (dark grey), which satisfies 
the clause Cj, leads to a covering of all connecting points and the clause gadget. 

of the outer boundary of the variable gadget, either in its FALSE or TRUE region, depending 
on whether the associated literal is negated or not. This way the switch is already covered by a 
square of the variable gadget if and only if the corresponding variable assignment makes the literal 
TRUE. We say that the switch is on if it is covered by a square of the variable gadget, and off 
otherwise. Figure [2] shows how to connect the clause gadget Cj with the three variable gadgets 
when the specific assignment of truth values is TRUE for xi,x% and FALSE for x-±. 

Lemma 3. Any clause gadget Cj and its connecting points can be covered with 2M + X^=i \ e i,j/2] 
unit squares if and only if at least one switch is on. 

Proof: Consider the covering of the connecting points when the corresponding switch is off, i.e., 
it is not covered by a square of the associated variable gadget. In this case, the first square of 
the connection must cover both the switch and the first connecting point. The following squares 
cover the second and third connecting points, etc. Since the number of connecting points is odd, 
the last square covers the last two connecting points. 

If the switch Sjj is on, i.e., it lies in the covering of the variable, then the first square of the 
connection can be moved to cover the first and second connecting points, the second square covers 
the third and fourth connecting points, and the last square covers the last connecting point and 
the ith link point of the clause gadget Cj. 

Clearly y ■_ 1 \ei.j/2] squares are necessary to cover all connecting points, thus the remainder 
of this lemma follows directly from Lemma [2] □ 

Since G(F) is planar, there exists an embedding of our construction so that no two connections 
overlap. Furthermore, since N is large (in particular larger than the degree of G(F)), we can place 
switches far away from each other (i.e., more than two units away from each other) so that the 
associated coverings are independent. Using the lemma above we derive the following lemma: 

Lemma 4. A planar 3-SAT formula is satisfiable if and only if the associated point covering 
problem instance can be covered with 2nN+2mM+E unit squares, where E = Y^iJLi Si=i \ e i,j/^\ ■ 

Proof: (-4=): Consider any covering of the points. Using Lemma [3] and the pigeon hole principle, 
2nN unit squares are needed to cover all variable gadgets and at least 2mM + E unit squares 
are necessary to cover all clause gadgets (including the connecting points and switches). Thus, 
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each variable must be covered with exactly 2N squares and each clause must use exactly 2M + 
Xri=i I~ e ij7 2 l squares. 

In particular, the covering for the variables is fixed; hence any covering gives a valid variable 
assignment. By Lemma [3] we get that at least one switch must be on for each clause. This 
corresponds to each clause Cj being satisfied at least once; thus the 3-SAT instance as a whole is 
satisfied. 

(=>): Given a variable assignment, we generate the corresponding covering. By construction, 
each clause Cj must have at least one switch on, therefore the gadget of Cj (and its connecting 
points) can be covered using 2M + J^ i=1 [ejj/2] squares. □ 

The following lemma on hardness of approximation follows from our construction above: 

Lemma 5. If the 3-SAT formula is not satisfiable, any covering with 2nN + 2mM + E squares 
has at least one square with area at least 9/4. 

Proof: By construction, all points have integer coordinates (semi integer if the point is a switch). 
That is, all points can be written as p = (u + k/2,v + k/2), where u,v £ N and k e {0,1}. Assume 
that there exists a covering which has a largest square with area strictly smaller than 9/4 (i.e., the 
largest square has side length smaller than 3/2). Given any square covering of the construction, we 
shrink each square until it has two points on opposite sides of the boundary, without uncovering 
any points. By shrinking the squares, we set the side length of each square to the difference in 
either x- or y-coordinates of some two points of the construction. Since by Lemma 2] it is not 
possible to find a covering with unit squares, the next possible side length is 3/2. 

□ 

We conclude this section with the following theorem: 

Theorem 1. Given n points in the plane, let p £ N be part of the input and let k be any fixed 
integer with n — k e Q(n). Then, the (jp, fc) -Square Covering problem is NP-hard. Moreover, 
it is NP-hard to find an approximate solution within ratio 2.25. 

3.2 Covering Points with Rectangles 

In this section we show NP-hardness for the (p, fc)-R,ECTANGLE Covering problem. Note that 
by making an affine transformation of the previous reduction for squares, we can easily obtain 
hardness for coverings with rectangles of any fixed ratio. However, the reduction does not work 
for arbitrary rectangles, since in this case we can cover each variable gadget with eight horizontal 
and vertical segments of zero area (i.e., arbitrarily thin rectangles). By doing so, all switches will 
be on, regardless of the variable assignment, and the reduction fails. Hence, we need a different 
reduction for the (p, fc)-R,ECTANGLE Covering problem. Again, we reduce from planar 3-SAT, 
and focus on the decision version of the problem for k = 0. We call an axis-aligned rectangle a 
unit rectangle if its area is at most one, and p unit rectangles form a unit covering if they together 
cover all points. 

3.2.1 Staircase sequences 

For our reduction, we need the notion of staircase sequences: 

Definition 1. A sequence S = (p±, . . -P2n) of 2N points in the plane is a staircase sequence if 
and only if it satisfies the following properties: 

• For any integer < i < N, two consecutive points pn and P2i+i of the sequence have the 
same ^-coordinate and two consecutive points p2i-\ and pn have the same y-coordinate (we 
assume the sequence is closed and set P2N = Po)- 

• No unit rectangle covers any two non-consecutive points of S. 
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Pi 



P2N 



Figure 3: Staircase sequence of 2N points. Selecting either the horizontal or vertical segments are 
the only ways of covering the sequence with N unit rectangles. 

We call n staircase sequences Si , . . . S n mutually independent if no unit rectangle contains points 
of more than one sequence. 

We will consider a covering of points that can be decomposed into mutually independent stair- 
case sequences. By definition, no unit rectangle can include points of two independent sequences, 
thus the coverings of each sequence can be considered independently. 

Consider any unit covering of a single staircase sequence of 2N points with N rectangles. If 
we cover successive points by horizontal or vertical segments, we obtain a covering with largest 
area zero. We call the covering of a staircase sequence vertical, if the sequence is covered by N 
rectangles such that each rectangle contains two points with the same ^-coordinate. Similarly, 
we call the covering of a staircase sequence horizontal, if the points inside one rectangle have the 
same y-coordinate, see Figure [3] 

Lemma 6. Any unit covering of a staircase sequence of2N points with N rectangles must either 
be a vertical or a horizontal covering. 

Proof: By the definition of staircase sequence no unit rectangle can cover three points. Therefore, 
each covering rectangle must contain exactly two consecutive points. Since the rectangles must be 
disjoint, cither all rectangles cover two points with the same x-coordinate or all rectangles cover 
points with the same y-coordinate. □ 

N unit rectangles are both necessary and sufficient to cover a staircase sequence of 2N points, 
therefore we have: 

Corollary 1. Any unit covering of n mutually independent staircase sequences, each with 2N 
points, that uses nN rectangles must have either a vertical or a horizontal covering for each 
sequence. 

3.2.2 Reduction 

We construct n mutually independent staircase sequences of 2 N points each, where n is the number 
of variables in the associated 3-SAT instance. Any unit covering of the points with nN rectangles 
gives a variable assignment as follows: variable Xi is set to TRUE if the «th staircase sequence has 
a horizontal covering, and FALSE otherwise. Similar to the square case, we add one more point 
for each clause. This point can only be covered by a unit rectangle if the corresponding variable 
assignment satisfies the clause. 

Recall that G(F) is planar, thus there exists a planar embedding of G(F) such that all edges 
can be drawn as rectilinear arcs in the unit grid. For simplicity, we first consider the case in which 
there is at least one negated and one non-negated literal in each clause (we will show how to deal 
with the other types of clauses later). We call the union of all rectilinear arcs that connect some 
variable x\ to the 1 < k < m clauses containing xi a rectilinear tree. That is, we consider the 
variable node as the root, and the k clause nodes as the leaves, and we choose an embedding for 
each tree such that the root and each internal node has degree exactly three and the whole tree 
has exactly N — (k + 1) bends. As G(F) is planar, and we can choose N sufficiently large, this 
is always possible. Consider now the rectilinear arc connecting variable Xi with clause Cj. We 
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Figure 4: As G(F) is planar, we can transform any plane embedding of G(F) into a rectilinear 
drawing such that each rectilinear tree has N — 2 bends and non-adjacent bends do not have the 
same x- or y-coordinatcs. 



modify the embedding such that the component of a tree incident to clause Cj is vertical if the 
literal ii is negated in Cj, and horizontal otherwise, which is also always possible. We then further 
perturb the embedding such that no two non-successive bends of any rectilinear arcs have the 
same x- or y-coordinate. Finally, to avoid overlap when thickening the trees (as explained in the 
next paragraph), we scale the embedding by a factor 2(n + 1), see Figure [4] for an illustration. 

We now replace each rectilinear tree containing N — (k + 1) bends and k + 1 endpoints (one 
of them a variable, the k others clause nodes) by a staircase sequence of 2N points as follows 
(see Figure[S]). We arbitrarily assign to each of the n rectilinear trees in G(F) a unique number 
5 S {1, . . . , n} and replace it by a path that is the Minkowski sum of the tree and a square of side 
length 28. Each rectilinear tree becomes a set of thickened paths that form a rectilinear polygon. 
Note that at any internal node (or the root), one of the vertical or horizontal components will split 
into two parts. When this happens, we add unit squares to the polygon until no non-consecutive 
edges of the polygon have the same x- or y-coordinatc, without changing the number of polygon 
vertices which is always possible. Furthermore, two endpoints of one thickened path will lie on 
the boundary of one of the other thickened paths. These two points can be ignored. We then 
walk along the boundary of the generated polygon, and number the vertices in clockwise order; 
let Si = (pi, . . . ,P2n) be the sequence of generated vertices. 

Lemma 7. The sequences S±, . . . , S n of vertices generated as above form n mutually independent 
staircase sequences, each of them containing 2N points. 

Proof: With the above transformation, we get the following new coordinates for the vertices 
of a tree. Let P = (X, Y) be a node of the tree before both the scaling and the thicken- 
ing, with integer coordinates. After the scaling with factor 2(n + 1) it has the coordinates 
P' = {{2n + 2)X, (2n + 2)Y). After the thickening with factor 5, the node transforms into a 
pair of vertices pi, 2 , that lie on a circle C with radius y/26 centered at P' . Depending on whether 
P is an endpoint (i.e., the root or a leaf) or a bend of the original tree, these two vertices either 
lie on a quadrant or on a diameter of C. As all the numbers involved are integer, we get for each 
node P of the tree a vertex pair with coordinates p\^ = ((2n + 2)X ± 5 ± k, (2n + 2)Y ± 6 ± k). 
Here, X and Y are integers, S < n is the thickening factor, k <S {0, 1} is a factor describing the 
possible addition of unit squares to avoid having the same coordinates in non-adjacent edges, and 
|2<5 + fc| < 2n + 2. Therefore, two points can be covered by a unit rectangle if and only if they share 
one coordinate. This can only happen when both points are adjacent on the generated staircase 
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Pi . . P2 

P3 
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Figure 5: Thickening of rectilinear trees results in a staircase sequence. For each endpoint or bend 
of the tree two new points at distance y/28 are generated. When an edge is split (dashed segments) 
we add unit squares until no non-adjacent edges of the sequence have the same x- or y-coordinate. 
We ignore the points that lie on the boundary of another thickened path (grey squares) . 



sequence. □ 

By construction, the generated staircase sequences do not intersect, except at the clause vari- 
ables. To remove these intersections, we modify the sequences locally around each clause node. 
Consider only a small neighborhood of clause Cj , and assume that we have a segment of length L 
connecting to Cj from the left (see Figure [6]). We add a point Pj at the position of node Cj to the 
staircase sequence. 

Assuming that pj = (0,0), we define V = L + 8 (where 8 is the thickness of the path) and move 
the three points located at (—L',8), (8,6) and (8,-8) to the new coordinates p\ = (—L',—l/L'), 
Pi = (—L,—l/L') and p% = (—L 7 —8). When connecting from below, right, or above, we use 
appropriately rotated versions of the transformation described above. 

Points pi and P2 are called the links between clause Cj and variable Xi . The main property of 
the construction is that we can cover both link points and the point Pj with a single rectangle of 



Pi Vi 




Figure 6: Local transformation around clause Cj (corresponding to point pj). Staircase before (light 
grey) and after (dark grey) moving points to avoid intersection with other staircase sequences. 
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Figure 7: Local transformation for clause Cj = €1 V £2 V £3: using a negation gadget (inside the 
grey box) we can negate a literal in Cj. 

area one. It is easy to see that the new coordinates of the three moved points are rational and 
that the staircase sequences remain mutually independent. 

Finally, we need to show how to deal with clauses with all three literals either negated or not. 
This is important, as we cannot have three horizontal or vertical connections to the same clause 
node. Let cj = l\ V £2 V £3 be such a clause, then we can transform it into the following three 
clauses: (£\ V £2 V £4) A (£3 V £4) A (£3 V £4). Here, £4 is a literal of a new variable, and the two last 
clauses assure that £4 has the opposite truth assignment of I3. 

For each such clause, we additionally generate only one variable and two clauses, thus the 
asymptotical size of the transformation as well as its planarity are not affected (see Figure [7]). 
This transformation needs only constant space, hence can be done independently for each literal. 
After transforming all such clauses we can proceed as before. 

Let P be the set of 2nN+m points of the n staircase sequences generated by the transformation 
of a 3-SAT formula with n variables and m clauses. We have arrived at the following lemma. 

Lemma 8. A planar 3-SAT formula in n variables is satisfiable if and only if the set P of2nN+m 
points generated as above can be covered with nN unit rectangles. 

Proof: (<=) Given a unit covering of P, we generate a variable assignment as follows: each variable 
is set to TRUE if its associated staircase sequence has a horizontal covering, FALSE otherwise. 
As any unit covering of P is a unit covering of the n mutually independent staircase sequences, 
this assignment is valid by Corollary [TJ 

We now show that this variable assignment satisfies all clauses; by construction, any rectangle 
that covers at least four points has area larger than one, thus no such rectangle can be in a unit 
covering. Since there are 2nN + m points in the construction and we want to cover them with nN 
rectangles, there must be exactly m rectangles, each covering three points. No three points from 
a variable gadget can be covered with a unit rectangle, thus each of the m rectangles must cover 
two variable points and the point pj corresponding to clause Cj. 

By construction of the clause node pj , such a covering is only possible if pj and any two links are 
covered by the same rectangle. Let Xi be the variable with two links that are covered together with 
Pj by one unit rectangle. If the literal of Xi is not negated in Cj, the links share the y-coordinate. 
Since both links are covered by the same rectangle, the gadget of xi must be horizontally covered, 
which corresponds to setting variable Xi to TRUE in our variable assignment. Since xi is set to 
TRUE and literal £i is not negated, clause pj is satisfied. The case with negated li is analogous. 

(=>): Given a variable assignment, we generate a corresponding covering for the gadget vari- 
ables. Each clause Cj is satisfied at least once, thus we can cover point Pj together with the link 
points of the variable that satisfies Cj with one unit rectangle. □ 
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For the (p, /c)-Rectangle Covering problem we can give the following inapproximability 
result: 

Lemma 9. If the 3- SAT formula is not satisfiable, any covering of the n staircase sequences with 
nN rectangles has at least one arbitrarily large rectangle. 

Proof: We scale the transformation by an arbitrarily large, constant factor M before the local 
transformation in the neighborhood of the variables is done. If the 3-SAT formula is satisfiable, 
a unit covering is possible. However, consider any covering of a non-satisfiable 3-SAT instance: 
since the thick paths become arbitrarily thick, horizontal and vertical coverings are forced, and 
thus each covering still gives a valid variable assignment. 

We must enlarge the rectangles such that they cover all clause points pj. Since the instance is 
non-satisfiable, for any variable assignment there exists a clause Cj = £\ V £2 V £3 with vertically 
covered variables if the literal is not negated, and horizontally covered variables otherwise. The 
minimum area rectangle that includes pj and two points sharing a y-coordinate (if the literal is 
not negated) includes the points pi and p%, and it has area M 2 L'5 = M 2 5L + M 2 S 2 , which is 
arbitrarily large. □ 



Theorem 2. Given n points in the plane, let p £ N be part of the input and k be any fixed integer 
with n — k e il(n). Then, the (p, k) -RECTANGLE COVERING problem is NP-hard. Moreover, the 
(p, k) -Rectangle Covering problem admits no constant-factor polynomial time approximation 
algorithm. 

4 Exact Algorithms for p < 3 

In this section, we present algorithms to efficiently compute the solution for the (p, fc)-Box Cov- 
ering problem for small values of p. For simplicity, we assume throughout the following sections 
that no two points have the same x- or y-coordinate, and we assume furthermore in the description 
of our algorithms that we want to cover exactly n — k points. An adaptation to cover at least 
n — k points is straightforward. Note that for p € {2,3}, we can always find an axis parallel line 
that separates one box from the others. We exploit this property for a divide-and-conquer type of 
approach. 

4.1 Covering Points with Squares 

Wc first want to cover n — k points of P with p squares. With a simple observation, we can improve 
an existing algorithm for computing the optimal solution of the (1, fc)-SQUARE Covering problem, 
which will function as our base case. Using certain monotonicity properties, we can apply binary 
search. 

4.1.1 (1, fc)-Square Covering 

Previously, an 0{n log n) expected time algorithm for the (1, fc)-SQUARE Covering problem was 
presented by Chan j5]. We make use of Chan's algorithm as a subroutine of our algorithms. 

A point p € P is called (A; + 1)- extreme if either its x- or y-coordinate is among the k + 1 
smallest or largest in P. Let E(P) be the set of all (k + l)-extreme points of P. 

Lemma 10. For a given set P of n points in the plane, we can compute the set E(P) of all 
(k + \)-extreme points of P in 0(n) time. 

We can use the standard selection algorithm [9j to select the point pl of P with (k + l)-st 
smallest x-coordinatc in linear time. We then go through P again to find all points with x- 
coordinate smaller than pl- Finding the points pr,pt,Pb and computing the rest of E{P) is 
symmetric. 
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The following lemma shows that the left side of the optimal solution of the (1, /c)-Square 
Covering problem lies on or to the left of the vertical line through p^, and that the right side lies 
on or to the right of the vertical line through pr. Similarly, the top side of the optimal solution 
lies on or above the horizontal line through px, and the bottom side lies on or below the horizontal 
line through ps- 

Lemma 11. The optimal square B* that solves the (1, /c)-Square Covering problem is deter- 
mined by the points of E(P) only. 

Proof: The covering square is convex, hence all outliers must come from outside the optimal 
square. As we want to minimize the area, there exists an optimal square B* such that at least 
three edges of B* each contain one point of P. If one edge, say the top edge, is determined by a 
point p G P \ E(P), it means that there are at least k + 1 outliers above B*, which is not allowed. 
□ 

Using this lemma, we obtain an improved running time as follows: 

Theorem 3. Given a set P of n points in the plane, the (1, fc)-SQUARE Covering problem can 
be solved in 0(n + klogk) expected time using 0(n) space. 

Proof: We first compute the set of extreme points E(P) in linear time and then run Chan's 
algorithm on the set E(P). The time bound follows directly, since \E(P)\ < Ak + 4. □ 



4.1.2 (2, fc)-Square Covering 

The following observation is crucial to solve the (2, /c)-Square Covering problem, where we look 
for two disjoint squares that cover n — k points. 

Observation 1. For any two disjoint axis-aligned squares in the plane, there exists an axis-parallel 
line £ that separates them. 

This observation implies that there is always an axis-parallel line £ that separates the two 
optimal squares (B*,B%) of the solution of a (2, fc)-SQUARE Covering problem. Let £ + be the 
halfplane defined by £ that contains B^ . Let P + be the set of points of P that lie in £ + (including 
points on £), and let k + be the number of outliers admitted by the solution of the (2, &)-Square 
Covering problem that lie in £ + . Then there is always an optimal solution of the (1, fc + )-SQUARE 
Covering problem for P + with size smaller than or equal to that of B^. The same argument 
also holds for the other halfplane £~, where we have B\, and k~ = k — k + . Thus, the pair of 
optimal solutions of B{ of the (1, fc + )-SQUARE Covering problem and B| of the (1, fc~)-SQUARE 
Covering problem is an optimal solution of the original (2, fc)-SQUARE COVERING problem. 

Lemma 12. There exists an axis-parallel line £ and a positive integer k' < k such that an optimal 
solution of the (2, fc)-SQUARE Covering problem for P consists of the optimal solution of the 
(1, fc')-SQUARE COVERING problem for P + and the (1, k — /c')-Square Covering problem for 
P- . 

We assume w.l.o.g. that £ is vertical, and we associate £ with m, the number of points that 
lie to the left of (or on) £. Let pi,p 2 , ■ ■ ■ ,p n be the list of points in P sorted by x-coordinate. 
Then £ partitions the points of P into two subsets, a left point set, Pl(ju) = {p±, . . . ,p m } and a 
right point set, Pr(to) = {p m+ i, . . . ,p n }, see Figure |SJ The optimal left square is a solution of 
the (1, fc')-SQUARE Covering problem for Pi(m) for < k' < k, and the optimal right square is 
a solution of the (1, k — fc')-SQUARE Covering problem for P R (m). 

We can efficiently compute the optimal solutions for P^(m) and Pfj(m) in each halfplane of 
a vertical line £ using the above (l,fc)-SQUARE Covering algorithm. However, as we have to 
consider many partitioning lines, it is important to find an efficient way to compute the (k + 1)- 
extreme points for each Pi(m) and Pn(m) corresponding to a particular line £. For this we use 
Chazelle's segment dragging query algorithm [7]. 
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Figure 8: For given k' and m, the optimal m* lies on the side with the larger square. 



Lemma 13 ([7]). Given a set P of n points in the plane, we can preprocess it in 0(n\ogn) time 
and 0(n) space such that, for any axis-aligned orthogonal range query Q, we can find the point p 
of P that has the highest y-coordinate of all points inside the query range Q in O(logn) time. 

We repeatedly apply Lemma[T3]as follows: We start to query with a rectangle Q that has upper 
boundary at +oo to find the topmost point. We then set the upper boundary of the rectangle to 
the y-coordinate of the topmost point and query again with the new rectangle. Doing this k + 1 
times gives the k + 1 points with highest y-coordinate in any halfplane. We rotate the set P to 
find all other elements of E(P) in the according half plane, and we get the following time bound. 

Corollary 2. After 0(n log n) preprocessing time, we can compute the sets P(Pl(to)) and E(Pn(m)) 
in O(fclogn) time for any given m. 

Before presenting our algorithm we need the following lemma: 

Lemma 14. For a fixed k' , the area of the solution of the (1, fc')-SQUARE Covering problem for 
P]j(m) is an increasing function of m. 

Proof: Consider the set Pj,(m + 1) and the optimal square B* of the (1, fc')-SQUARE Cover- 
ing problem for Pj,(jn + 1)- Clearly, Pl(to + 1) is a superset of Pt(m), as it contains one more 
point Pm+i- Since k' is fixed, the square E>1 has k' outliers in Pj,(m + 1). If the interior of P* 
intersects the vertical line I through p m , we translate P* horizontally to the left until it stops 
intersecting I. Let B be the translated copy of P* , then B lies in the left halfplane of £ and there 
are at most k' outliers admitted by B among the points in Pj,(m). Therefore we can shrink or 
translate B and get a square inside the left halfplane of £ that has exactly k! outliers and a size 
at most that of P*. Thus, the optimal square for Pi(m) has a size smaller or equal to that of P*. □ 

Lemma [Til immediately implies the following corollary. 

Corollary 3. Let (Pi,P2) be the solution of the (2, fc)-SQUARE Covering problem with separat- 
ing line £ with index m. Then, the index m* of the optimal separating line £* is at most m if the 
left square B\ is larger than the right square Bi; otherwise it holds that m* > m. 

To solve the (2, fc)-SQUARE Covering problem, we start with the vertical line I at the median 
of the x-coordinates of all points in P. For a given m, we first compute the sets E(Pr 1 (m)) 
and E(P R (m)). Then we use these sets in the call to the (1, fc')-SQUARE COVERING problem 
for Pi(m) and the (l,fc — fc')-SQUARE Covering problem for P R (m), respectively, and solve 
the subproblems independently. The solutions of these subsets give the first candidate for the 
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solution of the (2, fc)-SQUARE Covering problem, and we now compare the areas of the two 
obtained squares. According to Corollary [3J we can discard one of the halfplanes created by £ 
(see Figure [5]), hence, we can use binary search to find the optimal index m* for the given fc'. As 
the value of fc' that leads to the overall optimal solution is unknown, we need to do this for every 
possible k! . Finally, we also need to examine horizontal separating lines by reversing the roles of 
X- and y-coordinatcs. 

Theorem 4. For a set P of n points in the plane, we can solve the (2, fc)-SQUARE Covering 
problem in 0(n log n + k 2 log 2 k) expected time using 0(n) space. 

Proof: After 0(n\ogn) preprocessing time, we have O(fclogn) different queries, each of which 
takes 0(k log n) time, which gives a total running time of 0(n log n + k 2 log n) . We can show that 
this is equal to 0(n\ogn + k 2 log 2 k) by distinguishing the following two cases: 

If fc 4 < n, then it holds for the second term that k 2 log 2 n < ^/n\og 2 n <G 0(n), so the second 
term is asymptotically smaller than the first, and we have 0(n log n + k 2 log 2 n) = 0(n\ogn). 

If n < fc 4 , then logn < logfc 4 G O(logfc), so the second term is asymptotically bounded by 
Oik 2 log 2 fc), and altogether we have in this case 0(n\ogn + k 2 log 2 n) = 0(n\ogn + k 2 log 2 fc). 
Hence, in both cases the asymptotic time bound is 0(nlogn + fc 2 log 2 fc). □ 



4.1.3 (3, fc)-Square Covering 

The above solution for the (2, fc)-SQUARE Covering problem suggests a recursive approach for 
the general (p, fc)-SQUARE Covering case: Find an axis-parallel line that separates one square 
from the others and recursively solve the induced subproblcms. We can do this for p = 3, as 
Observation [1] can be generalized as follows. 

Observation 2. For any three pairwise- disjoint, axis-aligned squares in the plane, there always 
exists an axis-parallel line £ that separates one square from the others. 

Again we assume that the separating line £ is vertical and that the left halfplanc only contains 
one square. Since Corollary [3] can be generalized to (3, fc)-SQUARE Covering, we solve this case 
as before: fix the amount of outliers permitted on the left halfplane to fc' and iterate fc' from 1 
to fc to obtain the optimal fc*. For each possible fc', we recursively solve the two subproblems to 
the left and right of £ and use the solutions to obtain the optimal index m* such that the area 
of the largest square is minimized. Preprocessing consists of sorting the points of S in both x- 
and y-coordinates and computing the segment dragging query structure, which can be done in 
O(nlogn) time. 

In the left halfplane, we solve the (l,fc') subproblem as before; its running time is subsumed 
by the time needed to solve the (2, fc — fc') subproblem in the right halfplane. Each (2, fc — fc')- 
Square Covering subproblem is solved as described above, except that preprocessing in the 
recursive steps is no longer needed: The segment dragging queries can be performed directly since 
the preprocessing has been done in the higher level. Also, for the binary search, we can use the 
sorted list of all points in P, which is a superset of Pr(to). 

This algorithm has a total time complexity of 0(n\ogn + fc 3 log 3 n) = 0(n\ogn + fc 3 log 3 fc) 
(as before by distinguishing fc 6 < n from fc 6 > n). 

Theorem 5. For a set P of n points in the plane, we can solve the (3, fc)-SQUARE Covering 
problem in 0(n\ogn + fc 3 log 3 fc) expected time using 0(n) space. 

4.2 Covering Points with Rectangles 

We now look at the (p, fc)-RECTANGLE Covering problem, where we want to cover n — k points 
with pairwise-disjoint rectangles. It is straightforward to extend Lemma [TT] as well as Obser- 
vations [T] and [5] to rectangles, so we can use the same approach to solve the (p, fc)-R,ECTANGLE 
Covering problem as for the (p, fc)-SQUARE Covering problem when p < 3. 
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Figure 9: Counterexamples. Left: In M 2 , no splitting line may exist for p > 4. Right: In M 3 , no 
splitting hyperplane may exist for p > 3. 

Chan's algorithm [5J, however, does not apply to the (1, &)-Rectangle Covering problem, 
that means that once we have computed the set of (k + 1) -extreme points, we need to test all 
rectangles that cover n — k points. Our approach is an exhaustive search: We store the points 
of E(P) separately in four sorted lists, the top k + 1 points in T(P), the bottom k + 1 points in 
B(P), and the left and right k + 1 points in L(P), and R{P), respectively. Note that some points 
may belong to more than one set. 

We first create a vertical slab by drawing two vertical lines through one point of L(P) and 
R(P) each. All k' points outside this slab are outliers, which leads to k — k' outliers that arc still 
permitted inside the slab. We now choose two horizontal lines through points in T(P) and B(P) 
that lie inside the slab, such that the rectangle that is formed by all four lines admits exactly k 
outliers. It is easy to see that whenever the top line is moved downwards, also the bottom line 
must move downwards, as we need to maintain the correct number of outliers throughout. Inside 
each of the 0{k 2 ) vertical slabs, there are at most k horizontal line pairs we need to examine, 
hence we can find the smallest rectangle covering n — k points in 0(k 3 ) time when the sorted lists 
of E(P) are given. This preprocessing takes 0(n + klogk) time. We get the following theorem: 

Theorem 6. Given a set P of n points in the plane, we can solve the (1, £;)-Rectangle Cov- 
ering problem in 0(n + k 3 ) time using 0(n) space. 

Note that this approach leads to the same running time we would get by simply bootstrapping 
any other existing rectangle covering algorithm [TJ[5T| to the set E(P), which has independently 
been done in [3]. Note further that for the case of squares, it is possible to reduce the number of 
vertical slabs that need to be examined to 0(k) only, which would lead to a total running time of 
0{n + k 2 ). 

The (p, fc)-RECTANGLE Covering problem forp € {2, 3} can be solved with the same recursive 
approach as the according (p, fc)-SQUARE COVERING problem, and by using the (1, fc)-RECTANGLE 
Covering algorithm described above as base case. The running times change as follows. 

Theorem 7. Given a set P of n points in the plane, we can solve the the (2, /c)-Rectangle 
Covering problem in 0(n log n + A: 4 log A:) time, and the (3, A:)-Rectangle Covering problem 
in 0(n log n + k 5 log 2 k) time. In both cases we use 0(n) space. 

5 Concluding remarks 

In this paper we have extended the well examined axis-aligned box covering problem to allow at 
most k outliers. 

Our algorithms for p < 3 can be generalized to other functions than minimum area (e.g., 
minimizing the maximum perimeter of the boxes) as long as this function has some monotonicity 
property that allows us to solve the subproblems induced by the p boxes independently. 

To solve the (p, &)-Square Covering problems we use the randomized technique of Chan [5] 
as a subroutine, and thus our algorithms are randomized as well. Chan [5j mentioned that his 
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algorithm can be dcrandomizcd adding a logarithmic factor. Thus, our algorithms can also be 
made deterministic, adding an 0(logk) factor to the second term of their running times, see the 
proof of our Theorem 2J 

We can generalize all algorithms to higher dimensions where the partitioning line becomes a 
hypcrplane. However, there is a simple example (see Figure [9j right), showing that neither the 
(3, A;)-Square Covering, nor the (3, fc)-R,ECTANGLE Covering problem admits a partitioning 
hypcrplane for d > 2, hence our algorithm can only be used for p = 1, 2 in higher dimensions. 

Our algorithms do not directly extend to the case p > 4, as Observation [T] does not hold for 
the general case, see Figure left. Although no splitting line may exist, there always exists a 
quadrant separating a single box from the others. This property again makes it possible to use 
recursion to solve any (p, /c)-Square Covering or (p, fc)-RECTANGLE Covering problem. 

A natural extension of our idea is to allow either arbitrarily oriented squares and rectangles 
or to allow them to overlap. Both appears to be difficult within our framework, as we make use 
of the set of (k + l)-extreme points, which is hard to maintain under rotations; also we cannot 
restrict our attention to only these points when considering overlapping squares or rectangles. 
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